Frameshift alignment: statistics and post-genomic applications
نویسندگان
چکیده
MOTIVATION The alignment of DNA sequences to proteins, allowing for frameshifts, is a classic method in sequence analysis. It can help identify pseudogenes (which accumulate mutations), analyze raw DNA and RNA sequence data (which may have frameshift sequencing errors), investigate ribosomal frameshifts, etc. Often, however, only ad hoc approximations or simulations are available to provide the statistical significance of a frameshift alignment score. RESULTS We describe a method to estimate statistical significance of frameshift alignments, similar to classic BLAST statistics. (BLAST presently does not permit its alignments to include frameshifts.) We also illustrate the continuing usefulness of frameshift alignment with two 'post-genomic' applications: (i) when finding pseudogenes within the human genome, frameshift alignments show that most anciently conserved non-coding human elements are recent pseudogenes with conserved ancestral genes; and (ii) when analyzing metagenomic DNA reads from polluted soil, frameshift alignments show that most alignable metagenomic reads contain frameshifts, suggesting that metagenomic analysis needs to use frameshift alignment to derive accurate results.
منابع مشابه
Nutrigenomics and its Applications in Animal Science
Nutrigenomics applies genomic technologies to study how nutrients affect expression of genes. With the advent of the post genomic era and with the use of functional genomic tools, the new strategies for evaluating the effects of nutrition on production efficiency and nutrient utilization are becoming available. Nutrigenomics plays an efficient role in various fields of animal health like nutrit...
متن کاملFrameshift Mutations (Deletion at Codon 1309 and Codon 849) in the APC Gene in Iranian FAP Patients: a Case Series and Review Of The literature
Familial adenomatous polyposis (FAP) is responsible for <1% of colorectal cancer (CRC) cases and is inherited as an autosomal dominant trait. Patients generally present hundreds to thousands of adenomas and develop colorectal cancer by age 35- 40 if left untreated. Here we report four patients with germline frameshift mutation (small deletion) at exon 15 of adenomatous polyposis coli (APC) tumo...
متن کاملAlignment of protein-coding sequences with frameshift extension penalties
We introduce an algorithm for the alignment of proteincoding sequences accounting for frameshifts. The main specificity of this algorithm as compared to previously published protein-coding sequence alignment methods is the introduction of a penalty cost for frameshift extensions. Previous algorithms have only used constant frameshift penalties. This is similar to the use of scoring schemes with...
متن کاملPost-processing long pairwise alignments
MOTIVATION The local alignment problem for two sequences requires determining similar regions, one from each sequence, and aligning those regions. For alignments computed by dynamic programming, current approaches for selecting similar regions may have potential flaws. For instance, the criterion of Smith and Waterman can lead to inclusion of an arbitrarily poor internal segment. Other approach...
متن کاملThe post-genomic era of biological network alignment
Biological network alignment aims to find regions of topological and functional (dis)similarities between molecular networks of different species. Then, network alignment can guide the transfer of biological knowledge from well-studied model species to less well-studied species between conserved (aligned) network regions, thus complementing valuable insights that have already been provided by g...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 30 24 شماره
صفحات -
تاریخ انتشار 2014